Classification using partial least squares with penalized logistic regression

نویسندگان

  • Gersende Fort
  • Sophie Lambert-Lacroix
چکیده

MOTIVATION One important aspect of data-mining of microarray data is to discover the molecular variation among cancers. In microarray studies, the number n of samples is relatively small compared to the number p of genes per sample (usually in thousands). It is known that standard statistical methods in classification are efficient (i.e. in the present case, yield successful classifiers) particularly when n is (far) larger than p. This naturally calls for the use of a dimension reduction procedure together with the classification one. RESULTS In this paper, the question of classification in such a high-dimensional setting is addressed. We view the classification problem as a regression one with few observations and many predictor variables. We propose a new method combining partial least squares (PLS) and Ridge penalized logistic regression. We review the existing methods based on PLS and/or penalized likelihood techniques, outline their interest in some cases and theoretically explain their sometimes poor behavior. Our procedure is compared with these other classifiers. The predictive performance of the resulting classification rule is illustrated on three data sets: Leukemia, Colon and Prostate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PLS and SVD based penalized logistic regression for cancer classification using microarray data

Accurate cancer prediction is important for treatment of cancers. The combination of two dimension reduction methods, partial least squares (PLS) and singular value decomposition (SVD), with the penalized logistic regression (PLR) has created powerful classifiers for cancer prediction using microarray data. Comparing with support vector machine (SVM) on seven publicly available cancer datasets,...

متن کامل

Classification of EEG recordings in auditory brain activity via a logistic functional linear regression model

We want to analyse EEG recordings in order to investigate the phonemic categorization at a very early stage of auditory processing. This problem can be modelled by a supervised classification of functional data. Discrimination is explored via a logistic functional linear model, using a wavelet representation of the data. Different procedures are investigated, based on penalized likelihood and p...

متن کامل

On the impact of model selection on predictor identification and parameter inference

We assessed the ability of several penalized regression methods for linear and logistic models to identify outcome-associated predictors and the impact of predictor selection on parameter inference for practical sample sizes. We studied effect estimates obtained directly from penalized methods (Algorithm 1), or by refitting selected predictors with standard regression (Algorithm 2). For linear ...

متن کامل

Ridge penalized logistical and ordinal partial least squares regression for predicting stroke deficit from infarct topography

Improving the ability to assess potential stroke deficit may aid the selection of patients most likely to benefit from acute stroke therapies. Methods based only on ‘at risk’ volumes or initial neurological condition do predict eventual outcome but not perfectly. Given the close relationship between anatomy and function in the brain, we propose the use of a modified version of partial least squ...

متن کامل

Asymptotic distribution and sparsistency for l1 penalized parametric M-estimators, with applications to linear SVM and logistic regression

Since its early use in least squares regression problems, the l1-penalization framework for variable selection has been employed in conjunction with a wide range of loss functions encompassing regression, classification and survival analysis. While a well developed theory exists for the l1-penalized least squares estimates, few results concern the behavior of l1-penalized estimates for general ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 21 7  شماره 

صفحات  -

تاریخ انتشار 2005